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Abstract 

We study the capacity regions of broadcast channels with binary inputs and symmetric 
outputs. We study the partial order induced by the more capable ordering of broadcast channels 
for channels belonging to this class. This study leads to some surprising connections regarding 
various notions of dominance of receivers. The results here also help us isolate some classes of 
symmetric channels where the best known inner and outer bounds differ. 

1 Introduction 

In [1], Cover introduced the notion of a broadcast channel through which one sender transmits 
information to two or more receivers. For the purpose of this paper we focus our attention on 
broadcast channels with precisely two receivers. 

Definition: A broadcast channel (BC) consists of an input alphabet X and output alphabets 
yi and and a probability transition function p{yi, y2\x). A ((2"^!, 2"'^^),n) code for a broadcast 
channel consists of an encoder 

x" : 2"^l X 2"^2 ^ ;t'n^ 

and two decoders 

m ■■ yi 2"^i 

(n) 

The probability of error Pe is defined to be the probability that the decoded message is not 
equal to the transmitted message, i.e., 

= p [{m{Y^) / wi} u {m{Y^) / m}) 

where the message is assumed to be uniformly distributed over 2"^^ x 2"^^^ 

A rate pair R2) is said to be achievable for the broadcast channel if there exists a sequence 
of {{T^^\2'^^^),n) codes with Pi"^ ^ 0. The capacity region of the broadcast channel is the closure 
of the set of achievable rates. The capacity region of the two-receiver discrete memoryless channel 
is unknown. 

The capacity region is known for lots of special cases where there is a "dominant receiver" such 
as degraded, less noisy, more capable, essentially less noisy, and essentially more capable. In fact 
superposition coding is optimal here. An interesting observation in [7j was that the notions of more 
capable and essentially less noisy may not be compatible with each other. 

*The work of S. Shamai was supported by the Israel Science Foundation (ISF). 
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In this paper, we study in detail the notions of more capable receivers and essentially less 
noisy receivers by focusing on an important(commonly used in coding theory) class of binary- 
input symmetric-output(BISO) broadcast channels. We establish a slew of results and some of the 
interesting ones are summarized below. 

1.1 Summary of selected results 

Here are some of the results established in this paper. 

• Any BISO channel with capacity C is more capable than the binary symmetric channel with 
capacity C. (Corollary [TJ 

• The binary erasure channel with capacity C is more capable than any BISO channel with 
capacity C. (Corollary [2]) 

• Any two BISO channels with the same capacity and whose outputs have cardinality at most 
3, are more-capable comparable, i.e. one receiver is more capable than the other receiver. 
(Corollary [3]) 

• For any two BISO channels with same capacity, a receiver Yi is more capable than receiver 
Y2 if and only i/ receiver I2 is essentially less noisy than Yi. (They go in reverse directions !) 
(Lemma [4]) 

• Superposition coding region is the capacity region for a BISO-broadcast channel if any one 
of the channels is either a BSC or a BEC. (Corollary |4| 

• For two BISO channels with the same capacity, superposition coding is optimal if and only if 
the channels are more capable comparable. (Corollary [5]) 

• For two BISO channels of same capacity Marton's inner bound differs from the outer boundjB] 
unless the channels are more capable comparable (Theorem |3]) 

• We also show that it suffices to consider f/ — )• X to be BSC when we wish to compute the 
boundary of the superposition coding region for BISO broadcast channels. (Lemma [s]). This 
vastly generalizes a result of Wyner and ZivflO] for degraded BSC broadcast channel. 



1.2 Preliminaries 

Definition 1. |3] A channel Fi : X — )• Yi is said to be more capable than the channel F2 : X ^ Y2, 
denoted Fi > F2, if I{X; Yi) > I{X; ^2), Vp(x). 

Definition 2. [7j A class of distributions V = {p{x)} on the input alphabet X is said to be a 
sufficient class of distributions for a 2-receiver broadcast channel if the following holds: Given any 
triple of random variables {U, V, X) satisfying ([/, V) ^ X ^ (Yi, Y2) forms a Markov chain, there 
exists a distribution q{u,v,x) (also obeying the Markov relationship {U,V) — )■ X — )• (Yi,l2)) that 
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satisfies 

q{x) G V, 

I{U;Yi)p < I{U;Yi)„ i = l,2, 
I{V;Yi)p<I{V;Yt)g, i = 1,2, 

I{X;Yi\U)p< I{X;Yi\U)g, i = 1,2, (1) 
I{X;Yi\V)p<IiX;Y,\V)g, i = 1,2, 
I{X;Yi)p<I{X;Yi)q, i = 1,2, 

Definition 3. [7\ A channel Fi : X — t- Yi is essentially less noisy compared to a cfiannel F2 : X ^ 
Y2, denoted by Fi ^ F2, if there exists a sufficient class of distributions V such that whenever 
p{x) G V, for aim ^ X ^ (^1, ^^2) we have 

I{U;Y2)<I{U;Yi). 

In this paper, we restrict ourselves to a class C of discrete memoryless channels with binary 
inputs and symmetric outputs(BISO) as defined below. 

Definition 4. A discrete memoryless channel with input alphabet X = {0, 1} and output alphabet 
y = {k : —I < k < 1} is said to belong to class C (or BISO) if 

= P(y = k\X = 0) = P(y = -k\X = l),-l<k< I. 

Binary symmetric channel(BSC) and Binary Erasure Channel(BEC) are examples of channels 
that belong to the class C. It is easy to see that uniform input distribution is the capacity achieving 
distribution for any channel in C. 

Remark 1. As k = can be split equally into 0+ and 0^ with probability Pq+ = Pq- = po/2, so we 
just consider k = ±1, it^ and use {pk,P-k ■ k = 1, . . . ,1} to denote the transition probabilities. 
Sometimes shortened to {pk,P-k}k- 

Partition P of an interval [a, b] is a finite sequence (points) {tk}k such that a = Iq < ti < t2 < 
. . . < tjsf = b. A partition P is finer than Q if points of partition P contain those of Q. A common 
refinement of two partitions P and Q is a new partition consisting of all the points of P and Q. 

Definition 5. (BISO partition and BISO curve) 

For a BISO channel with transition probabilities {pk,P-k}k: rearrange /t( p^^^ ^ ) in the ascending 
order and denote the permutation as vr. BISO partition is defined as the partition of [0, 1] with 
points tk = Yli=iiP-n-i +P-7ri)- We set to = 0. BISO curve is defined as the stepwise function f{t) 
such that fit) = /i( pJ;_^J on {tk-i,tk], and /(O) = 0. 

For the channel BSC{p), we have the partition as to = 0,ti = 1 and the curve as f{t) = h{p) 
on (0, 1]. For the channel BEC{e), we have the partition as to = 0, ti = 1 — e, t2 = 1, and the curve 
as f{t) = on (0, 1 - e] and f{t) = 1 on (1 - e, 1]. 

Definition 6. (Lorenz curve of a BISO channel) 

For a BISO channel with BISO curve f{t), the Lorenz curve (or the cumulative function) F{t) is 
defined as F{t) = /(r)dr. 

Properties of the Lorenz curve: 

Since < f{t) < 1 and f{t) is non-decreasing on [0, 1] we have 
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1. F(t) is non-negative, piecewise linear and convex. 

2. The slope of the line segments of F(t) is at most 1. 

By definition of BISO curve, the length of A;-th interval is [p-j^f, Therefore 

hix*h-\fiT)))dT- [ /(r)dr 
Jo 

= [ hix*h-\fiT)))dT-Fil) 

Jo 

Thus, a finer partition does not change I{X; Y) and in particular the channel capacity. Indeed the 
capacity is C = 1 — F{1). 

2 Main 

2.1 On partial orderings and capacity regions of BISO broadcast channels 
2.1.1 On more capable comparability of BISO channels 

We will establish a sufficient condition for determining whether two BISO channels are comparable 
using the more capable partial ordering. Before we state our sufficient condition for more capable 
comparable, we need the following three lemmas. 

Lemma 1. Given BISO channels X ^ Y and X — )• Z with BISO curves f{t) and g{t), respectively. 
Let the common refinement of these two BISO partitions be {tk : k = 0, . . . , N}, and S,k = ^fc — tk-i- 
Then 

i i 

F{ti) = ikf{tk) < ikg{tk) = G{ti), i = i,...,N 

k=l k=l 

if and only if the Lorenz curve F{t) < G{t) for all t S [0, 1]. 

Proof. The z/ direction is obvious. We just need to prove the other direction, i.e. F{ti) < G{ti) =^ 
F{t) < G{t). We prove by contradiction: Let t* be a point such that F(t*) > G{t*). Clearly 
t* G {tj-i,tj) for some j. Since < G{tj-i) by assumption, it is necessary that f{t) > g{t) 

for t G {tj-i,tj). However integrating from t* to tj, we have that F{tj) > G{tj), which contradicts 
the assumption that the inequality is valid for all t^. □ 

The following lemma is well-known. 

Lemma 2. (Lemma 2 in flO^ ) 

The function h{x * h~^{y)) is strictly convex in y. ( Key ingredient of Mrs. Gerber's lemma) 
Lemma 3. (Lemma 1 in f^) 

Let xi,...,xi and yi,...,yi be nondecreasing sequences of real numbers. Let ^i,...,^/ be a sequence 
of real numbers such that 

j=k j=k 
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with equality for k = 1. Then for any convex function A, 

I I 

i=i j=i 
Theorem 1. (A sufficient condition) 

Given BISO channels X and X ^ Z with Lorenz curves F{t) and G{t), respectively. Further 
let F{1) = G{1), i.e. channels have same capacity. If F{t) < G{t) then Y is more capable than Z. 

Proof. Using Lemma [l] we know that 

i i 

F{U) = Y^ikfitk) < ikg{tk) = G{t,), i = 1, . . . , iV 

k=l k=l 

and since -F(l) = G{1) we have equahty at i = iV. Using Lemma [s] and by noticing that /(tfc) and 
g{tk) are both nondecreasing we have 

N N 

Y,mfih))>Y.^^mh)) 

3=1 i=i 
for any convex function A. Taking A(y) = h{x * h~^{y)) — y we obtain that 

N N N N 

Y.iM^*h-'\f{t,))) - Y^ijfitj) > J2^M^*h''(9{tj))) -Y.^Mtj)- 
j=i j=i j=i j=i 

From ([2]) this is equivalent to 

I{X;Y)>I{X;Z)yp{x). 
Thus the theorem is estabhshed. □ 

For reasons that will be apparent later (Lemma [5]) it is useful to zoom in on the following 
subclass of BISO channels. 

Let C{C) be the class of BISO channels with capacity C. 

For instance BSC{p) belongs to this class, where 1 — h{p) = C. Similarly BEG{e) belongs to 
this class when 1 — e = C. Let F{C) denote an arbitrary BISO channel belonging to this class. 
Using an abuse of notation, we denote by BSC{C) and BEC{C) as the binary symmetric channel 
and the binary erasure channel with capacity C, respectively. 

Corollary 1. F(C) > BSC{C). 

Proof. From Theorem[T]it suffices that the Lorenz curves satisfy G{t) < FBsc{t)}t £ [0, !]• Observe 
that G(0) = Fbsc{0) = 0, = Fbsc{^) and that FBscit) is the straight-line connecting and 
^BSc(l)- The convexity of G{t) (Property 1) implies that G{t) < FBsc{t),t € [0, 1]. □ 

Corollary 2. BEC{G) > F{C). 

Proof. Similar to above it suffices that the Lorenz curves satisfy FBEcif) < G{t),t G [0,1]. 
FBEc{t) = 0,t G [0, 1 - e] and hence FBEc{t) < G{t),t G [0, 1 - e]. Combining FBEci^) = G{1) 
and (comparing slopes) F'^^u{t) = fBEc{t) = 1 > g{t) = G'{t),t G (1 — e, 1], we also have 
FBEcit) <G{t),te[l-e,l]. □ 
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Y, Z of size 3 




Figure 1: Lorenz curves for BISO channels with the same capacity and output of size 3. 
2.1.2 Relation to information combining 

Some of the results, more precisely Corollaries [T] and [2j can be obtained via an almost direct 
application of the results in jSj. From [9j, for [/ — )■ X ~ BSC{s), if y is a BISO receiver (with 
same capacity as BEC and BSC) 

I{X; U, Ybsc) < I{X; U, Y) < I{X; U, Ybec) 

which then yields I[X;Ybsc\U) < I{X;Y\U) < I{X;Ybec\U). But by symmetry conditioning on 
U, where ?7 — t- X ~ BSC{s) is same as taking X ~ P(X = 0) = 1 — s. One could also obtain the 
same conclusion by using the results in [7J. However here we have used a different approach, via 
Theorem [T| to establish the extreme properties of BSC and BEC. 

Corollary 3. Let Fi[C) and F2{C) he two BISO channels in C whose output alphabet sizes are at 
most 3. Then either Fi{C) ^> F2{C) or F2{C) ^> Fi{C), i.e. two such channels are always more 
capable comparable. 

Proof. For BISO channel X ^ Y with transition probabilities {p-i,p(),pi}, A; = is split equally 
into 0"^ and 0~. Thus the Lorenz curve F{t) contains two sloping lines: one with slope — ) = 

1, and the other not bigger than 1. Given two Lorenz curves of this kind, F{t) and G{t), with 
= G(l), then either F(t) < G{t) for all t G [0, 1] or F{t) > G{t) for all t G [0, 1] (Figure [T]). 
According to Theorem [l} these two channels are more capable comparable. □ 

Remark 2. Not all BISO channels with the same capacity are more capable comparable. A counter 
example is the following: Consider a BISO channel X — )■ {Y, Z) with transition probabilities ac- 
cording to: 

P(y = i\X = 0) = Oi, -2 < i < 2 
P{Z = j\X = 0) = bj,-2<j <2 

where o_2 = 0.061, a_i = ai = ^—^^^,a2 = 9a-2 and &_2 = 0.0634977, 6_i = ^—^^,bi = 
4(i-b^2) ^ _ Qhq can verify that the channels have same capacity, but are not more capable 
comparable. 
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2.1.3 On more capable and essentially less noisy orderings in BISO channels 

In this section we will establish that these two partial orderings, restricted to C, are inverses of 
each other(!). This is counter-intuitive as more capable and essentially less noisy are two notions 
of saying that one receiver is superior to another receiver. 

Below (for a complete argument see Lemma 1 in [7j ) we note that the uniform input distribution 
forms a sufficient class for a broadcast channel consisting of two channels Fi,F2 G C. 

Claim 1. Consider a binary input broadcast channel whose component channels, Fi : X ^ Yi 
and F2 : X ^ Y2 are both output-symmetric, i.e. Fi, F2 & C. Then the uniform input distribution 
P{X = 0) = ^ forms a sufficient class. 

Proof. The following construction suffices - we leave the details to the reader. Let j,k G {0,1}; 
then define 

nnr ( m ( i-^ y ^ { \Y>{U = u,V = v,X = x ® j) j = k 
q{U = {u,j),V = {v,k),X = x) = < ^ . / , • 

□ 

Lemma 4. Let Fi, F2 E C{C); then Fi > F2 -F2 ^ i^i- 

Proof. Assume Fi ^ F2. From Claim 1 we know that P(X = 0) = ^ is a sufficient distribution for 
the channels Fi,F2. Therefore, when P\X = 0) = | we have for all U such that C/ — )■ X — )■ {Yi, Y2) 

I{U-Yi) = I{X-Yi) - I{X-Yi\U) 
= C-I{X;Yi\U) 
= I{X;Y2)-I{X;Yi\U) 
= I{U; Y2) + I{X; Y2\U)- I{X; Yi\U) 

<m-Y2), 

where the last inequality follows from Fi ^ F2. Since P{X = 0) = ^ is a sufficient class of 
input distributions for a broadcast channel comprising of Fi , F2 it follows from the definition that 
F2 h Fi. 

Assume F2 ^ Fi. The proof follows by contradiction. Suppose there is a value x such that 
when P{X = 0) = x,I{X;Y2) - I{X;Yi) = 6 > 0, then consider a U such that F{U = 0) = 
P([/ = 1) = 1, p(X = 0\U = 0) = x = P{X = l\U = 1). Observe that, from the symmetry 
I{X; 12!^) — H-^'i Yi\U) = 5 > 0. However since P{X = 0) = |, using a similar decomposition we 
see that 

I{U; Yi) = I{U- Y2) + I{X- Y2\U)- I{X- Yi\U) 
= I{U-Y2) + 5> I{U;Y2), 

contradicting the assumption F2^ Fi. Therefore Fi^ F2. □ 

The following lemma is an immediate consequence of Corollaries [T| [2| and Lemma |4] 

Lemma 5. Let BSC{C) represent a binary symmetric channel with capacity C, BEC{C) - a 
binary erasure channel with capacity C , and F[C) - an arbitrary binary input symmetric output 
channel, i.e. F £ C, with capacity C. We have 
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(i) BEC{C) > F{C) > BSC{C), 

(ii) BSC{C) t F{C) h BEC{C). 

This leads us to one of the main results in this paper. 

Theorem 2. Let BSC{C) represent a binary symmetric channel with capacity C, BEC{C) - a 
binary erasure channel with capacity C , and F{C) - an arbitrary binary input symmetric output 
channel, i.e. F £ C, with capacity C. For any three numbers < Ci < C2 < C3 we have 

(i) BEC{C3) > F{C2) > BSC{Ci), 

(ii) BSCiCs) t F{C2) t BEC{Ci). 

Proof. If Ca < Cb then BSC{Ca), BEC{Ca) are degraded versions of BSC{Cb), BEC{Cb) respec- 
tively. Hence from Lemma [5] we have 

BECiCs) > BEC{C2) > F{C2) > BSC{C2) > BSC{Ci), 

BSCiCs) t BSC{C2) h F{C2) h BEC{C2) t BEC{Ci). 

□ 

The following corollary is immediate. 

Corollary 4. Superposition coding region is the capacity region for a BISO-broadcast channel if 
any one of the channels is either a BSC or a BEC. 

Proof. Superposition coding is optimal both for more capable comparable channels [2] and for es- 
sentially less noisy comparable channels [7] . From Theorem [2| if any one of the channels is either 
a BSC or a BEC, then the channels are either more capable comparable or essentially less noisy 
comparable. □ 

Remark 3. In [7J the capacity region of a BSC/BEC broadcast channel was established. Corollary 
|4] generalizes this result to only requiring that one of the BISO channels is a BEC or a BSC. 

2.2 Comparison of inner and outer bounds for BISO channels 

The following are some commonly used inner bounds (or achievable rate regions) for the capacity 
region (CR): 

• Time-Division region (TD): This region is characterized by the set of points 

Ri < aCi 
R2<{1- a)C2, 

where Ci and C2 are the channel capacities for the two receivers, respectively. The rates are 
achieved by transmitting at capacity Ci to the first receiver for fraction a of the time, and 
at capacity C2 to second receiver for the remaining fraction. 
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• Randomized Time-Divison region (RTD): This corresponds to a time-division strategy except 
that the slots for which communication occurs to one receiver is also drawn from a codebook 
which conveys additional information. The rates are characterized by 

Ri < I{W; Yi) + P{W = 0)I{X; Yi\W = 0) 
R2 < HW; Y2) + P{W = Y2\W = 1) 

Ri + R2< min{/(M^; Yi), I{W; Y2)} + P{W = 0)I{X; Yi\W = 0) + P{W = 1)I{X; Y2\W = 1), 

over binary random variables W satisfying — )• X — )• (Yi,l2) being Markov. The binary 
random variable W characterizes the slots which distinguish communication to one receiver 
over the other. 

• Marton's Inner bound (MIB): This is the best known achievable rate region. The rates are 
characterized by 

Ri < I{U,W;Yi) 
R2<I{V,W;Y2) 

Ri + R2< mm{I{W; Yi),I{W; Y2)} + I{U; Yi\W) + I{V; Y2\W) - I{U; V\W), 

over random variables {U, V, W) satisfying (U, V, W) — )■ X — )■ (11,12) being Markov. Observe 
that setting U = X,V = 9 when W = and V = X,U = 9 when W = 1 reduces MIB to the 
RTD region. 

Lemma 6 (^). For binary input broadcast channels, the maximum sum rate implied by 
Marton's inner bound(MIB) matches that of randomized time-divison(RTD) region. 

• Outer bound (OB): The following region[6] represents an outer bound to the capacity region. 
The union of rate pairs 

Ri<I{U-Yi) 

R2 < I{V;Y2) 
Ri + R2< HU; Yi) + I{X; Y2\U) 
Ri + R2<IiV;Y2) + I{X-Yi\V) 

over all {U,V) — )• X — (li,l2) represents an outer bound to the capacity region. 

Remark 4. For BISO channels since P{X = 0) = ^ is a common sufficient distribution, it can 
be shown that the OB matches an earlier outer bound due to Korner and Marton [5]. 

We adopt the notation in Table [TJ 

Lemma 7. Consider a 2-receiver broadcast channel where both X — t- Yi and X ^ Y2 represent 
the BISO channels with transition probabilities {qk,Q-k : 1 < ^ < X} and {pj,p^j : 1 < i < 
A'^} respectively. Consider the following region formed by taking the union of rate pairs (iii,i?2) 
satisfying 

R2<I{U;Y2) 
R2 + Ri<I{U;Y2) + IiX;Yi\U) 
Ri<I{X;Yi) 

over all p{u)p{x\u)p{yi,y2\x). Then the same region can be realized by restricting to a binary U 
such that U^X BSC{s) and P{X = 0) = ^. 
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Table 1: Notation 



Abbr. 




Abbr. 




TD 


time-division region 


BSC 


binary symmetric channel 


RTD 


randomized time-division region 


BEG 


binary erasure channel 


MIB 


Marton's inner bound 


e.l.n. 


essentially less noisy 


CR 


capacity region 


e.m.c. 


essentially more capable 


OB 


Outer bound (Korner-Marton, Nair-El Gamal) 


* 


binary convolution 


BISO 


binary input symmetric output 


h{.) 


binary entropy function 



Proof. The proof is presented in the Appendix. □ 

Let [/ ^ X ~ BSC{si),V ^ X ~ BSC{s2) and P{X = 0) = Let I{U;Yi) = /i(si), where 
P{X = 1\U = 0) = si, and define I{V; Y2) = f2{s2) in a similar fashion. It is clear from symmetry 
that h{s) = /i(l - s), f2{s) = /2(1 - s). 

From Lemma [7] and Remark [4] it follows that OB can be written as the union of rate pairs 
Ri , i?2 satisfying 

Ri < fiisi) 

R2 < f2{s2) 

Ri + R2< fi{si) + C- f2{si) (3) 

Rl + R2<f2{s2) + C-fi{s2). 

for some < si, S2 < |. 
Let 

/ = {sG [0,0.5] >/2(s)} 
J = {sG [0,0.5] </2(s)}. 

The following result relates the equivalence of the various bounds and their relation to whether 
the channels are more capable comparable. 

Theorem 3. Let Fi,F2 G C(C). Then the following are equivalent: 

(a) Fi and F2 are not more capable comparable 

(b) TD C OB 

(c) There exists si £ I , S2 £ J such that + /2(s2) > C 

(d) TD C MIB 

(e) MIB C OB. 

Proof. The proof of this equivalence is presented in the Appendix. □ 

Corollary 5. For two BISO channels with the same capacity, superposition coding is optimal if 
and only if the channels are more capable comparable. 
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Proof. If superposition coding region is indeed the capacity region, then we have Ri + R2 < 
I{X;Yi) < C. Further since the two channels have the same capacity, we have the TD region 
is optimal. From Theorem [3] we have that the channels are more capable comparable. □ 

Remark 5. A characterization of when superposition coding is optimal for 2-receiver broadcast 
channels is open in general. It is known that superposition coding is optimal when the channels are 
either essentially more capable comparable or essentially less noisy comparable |3 - two incompatible 
notions. However a converse statement is still unknown. 

Observation 1. From remark [2] we know that there exists a pair of channels Fi,F2 G C{C) which 
are not more capable comparable. Hence from Theorem [3] we know that the capacity region is 
strictly larger than TD. However, if we replace F2 by BEC{C), a more capable channel, then the 
capacity of the broadcast channel formed by Fi and BEC{C) is the TD region (Corollary [2]). Thus 
replacing by a more capable channel can strictly reduce the capacity region. 

This observation leads to an operational definition of a better receiver and a partial order as 
follows. 



2.2.1 A new partial order 

We now introduce a natural operational partial order among broadcast channels. 

Definition 7. Receiver Z2 is a better receiver than Y2 if the capacity region of X — t- (Yi,Z2) 
contains that of X — )• (Yi, Y2) for every channel X — )• Yi. In other words, if we replace receiver Y2 
by receiver Z2 then the capacity region will not decrease. 

Remark 6. Note that the capacity region of a broadcast channel just depends on the marginal 
distributions X — )• Yi, X ^Y2, and hence the definition makes sense. 

From Observation [1] we know that a more capable receiver is not necessarily a better receiver. 
However we will show that if Z2 is a less noisy receiver than I2, then Z2 is indeed a better receiver 
than Y2. 

Claim 2. If Z2 is a less noisy receiver than Y2, then Z2 is a better receiver than Y2. 

Proof. The capacity region of a discrete memoryless broadcast channel has the following n-letter 
characterization. Consider the region TZn defined as the union of rate pairs {Ri,R2) that satisfy 

Ri < -IiU;Y,^) 
n 

R2<-I{V;Y^) 
n 

for some p{u)p{v)p{x'^\u, v). It is known that the capacity region is lim„7^„. (This is folklore. It is 
clear that this is achievable, and a converse follows by setting U = Mi and V = M2 and applying 
Fano's inequality.) Observe that 

I{V- Y^^.Z^^^^^) = I{V- Yi-\ Z^^.+i) + I{V- Y2j\Yi-\ j = n, . . . , 1 
< I{V;Yi-\z,^^^i) + I{V-Z2,\Yi-\z,^^^i) 
= IiV;Yi-\z,^). 

By taking the extreme points of this chain we obtain that I{V; Y2) < I{V] ^2 )• Claim follows from 
the expression of the capacity region stated above. □ 
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3 Conclusion 

We look at partial orders induced by the more capable relations and less noisy relations in binary- 
input symmetric-output(BISO) broadcast channels. We establish the capacity regions for a class 
of them and also show various other results related to the evaluation of various bounds. Some of 
the results act contrary to popular intuition and hence BISO channels can serve as a simple class 
from which we can improve our understanding of various relations. We also use perturbation based 
arguments to show the optimality of certain auxiliary channels, thus generalizing earlier results. 
We hope that some of the results presented here can invoke a careful rethinking of various notions 
of dominance between receivers. 
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Appendix 

A.l Proof to Lemma H] 

Proof. Let U = {1,2, ...,m}, P([/ = i) = m and P(X = 0|[/ = i) = Si. Further let h{x) = 
—X log2 x — {\—x) log2(l — x) be the binary entropy function and let * denote the binary convolution, 
i.e. a*b = a(l — 6) + 6(1 — a). 
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Using these notations we have the following expansions, 

I{X- Y^\U) = J2iQk + E ^M^^ * Si) - h{^^)) 

V Qk + q-k Qk + q-k 

I{X;Y,) = Y,{qk + q-k){K^^ * ^^UiS,) - h{^^)). 

^ ^ Qk + q-k qk + q-k 

k I 

Define U = {l,2,...,m} x {1,2}, P(t/ = (i,l)) = ^, P(X = 0|[/ = (i, 1)) = Si, V{U = 
{i,2)) = ^, and P{X = 0\U = {i,2)) = 1 - Sj. This induces an X with P(X = 0) = ^ and it is 
straightforward to notice 

I{U;Y2) > I{U;Y2), 
I{X;Yi\U) = I{X;Yi\U), 
I{X;Yi)>I{X;Yi). 
Thus for every U replacing U hy U leads to a larger achievable region. 

Hence it suffices to maximize over all auxiliary random variables of the form {U, X) defined by: 
U = {l,2,...,m} X {1,2}, P{U = = ^, P(X = 0\U = (i, 1)) = Si, F{U = (i,2)) = ^ and 

P{X = 0\U = (i, 2)) = 1 — Si- Let this class of random variables (J7, X) be Q. 

Since P(X = 0) = ^ remains fixed, the third inequality remains constant. Therefore, to compute 
the extreme points, we proceed to compute the distribution (U, X) (belonging to Q) that maximizes 
XI{U;Y2) + {I{U;Y2) + I{X;Yi\U)). 

For a given p{u,x) G Q, \U\ = 2m , consider the multiplicative Lyapunov perturbation defined 

by 



R{U = 


{i,l),X 


= 0) 


= P(C/ = 


{i,l),X = 


0){1 + eL{i)) 


R{U = 


{i,l),X 


= 1) 


= V{U = 


{i,l),X = 


l)(l + £L(z)) 


R{U = 


{i,2),X 


= 0) 


= R{U = 


{i,l),X = 


1) 


R{U = 


{i,2),X 


= 1) 


= R{U = 


{i,l),X = 


0) 



For r{u,x) to be a valid probability distribution we require the conditions 1 + £L{i) > 0,Vz and 

E™i^P(f/ = (i,i))^i) = o. 

Observe that the perturbation maintains P(X = 0) and further the new pair r(ti, x) also belongs 
to Q. A non-trivial L exists if m = > 2. 
Observe that 

{\+l)Ir{U-Y2)+Ir{X-Yi\U) 

= (A + l)Hp{Y2) + XHp{U) + Hp{U, Yi) - (A + l)Hp{U, Y2) 
+ e{XH^{U) + H^{U, Yi) - (A + 1)H^{U, Y2)) 

where 

^P^(^) = - E W) log 2p(z) 

i 

H^iU, Yi) = -J2 Mi, yi)L{i) log 2p{i, yi) 

i,yi 

H^{U, I2) = - E Mi, y2)L{i) log 2p{i, y2). 

i,y2 
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The first derivative with respect to e being zero impHes 

XH^{U) + H^{U, Yi) - (A + 1)hI^{U, Y2) = 

and this further imphes that if p{u, x) achieves the maximum of {X + l)Ip{U ; I2) + IpiX; Yi\U) then 
(A + l)Ir{U;Y2) + Ir{X;Yi\U) = (A + l)Ip{U;Y2) + Ip{X;Yi\U) for any valid perturbation that 
satisfies Q. 

Now we choose e such that miuj 1 + eL(i) = 0, and let i = i* achieve this minimum. Observe 
that r(i*) = and hence there exists an U with cardinality equal to 2(m — 1) such that (A + 
1)I{U; Y2) + I{X; Yi\U) is constant. We can proceed by induction until m = 1. 

Since (U,X) G Q and = 2, implies that the optimal auxiliary channel U ^ X follows the 
distribution given by 

P{U = 1)=F{U = 2) = ^ 
F{X = 0\U = 1) = P{X = l\U = 2) = s, 
i.e. U ^X ^ BSC{s). □ 

The same proof can also be used to establish the following lemma. 

Lemma 8. Consider a 2-receiver broadcast channels where both X ^ Yi and X ^ Y2 represent 
the BISO channels with transition probabilities {qk,q-k : 1 < ^ < N} and {pj,P-j : 1 < i < N} 
respectively. Consider the following superposition coding region formed by taking the union of rate 
pairs (i?i,i?2) satisfying 

R2<I{U;Y2) 
R2 + Ri< I{U;Y2) + I{X;Yi\U) 
R2+Ri< I{X;Yi) 

over all p{u)p{x\u)p{yi,y2\x). Then the same region can be realized by restricting to a binary U 
such that U X BSC{s) and F{X = 0) = ^. 

Remark 7. This generalizes the result by Wyner and Ziv [lOj for BSC broadcast channels. In [2] 
it was shown that superposition coding is indeed optimal when the two channels are more capable 
comparable. 

A. 2 Proof to Theorem [3] 

Proof, (a) =^ (b): Recalling: Let 

I = {sG [0,0.5] >/2(s)} 
J = {sG [0,0.5] </2(s)}. 

Since the channels are not more-capable comparable, we know that there esists si E / and ,52 G J. 
Construct f/ — )• X, where U = U' x Q with binary U' and Q, and probabilities 



2 

P(C/ = (0,1)) = | 

P(C/ = (1,0)) = ^ 
P(C/ = (1,1)) = ^ 



P{X 


= 0\U = 


(0,0)) 


= 1 


F{X 


= 0\U = 


(0,1)) 


= Si 


F{X 


= l|f/ = 


(1,0)) 


= 1 


F{X 


= l|f/ = 


(1,1)) 


= Sl 
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Thus, U' ^ X BSC{0) conditioned on the event Q = 0, [/' X ~ BSC{1 - si) conditioned on 
(5 = 1, and further U' is independent of Q with F{U' = 0) = |. We can see that Q is independent 
of X and hence of Fi, Ya; thus I{Q; Yi) = I{Q; Y2) = 0. Now 

IiU;Yi) = I{U',Q;Yi) = I{U';Yi\Q) + I{Q;Yi) 
= I{U'-Yi\Q) 

= (1 - £)I{X- Yi) + eI{U'- Yi\Q = 1) 
= {l-e)C + eh{si). 



Similarly, we obtain 
Thus we have 



I{U-Y2) = {l-e)C + ef2{si). 



Ri < (l-e)C + e/i(si) 

R2 < f2{s2) 

Ri + R2< I{U;Yi) + I{X;Y2\U) 

= I{ij-Yi) + I{X-Y2) - I{U;Y2) 

= (1 - e)C + ehisi) + C-[{1- e)C + 6/2(51)] 

= C + e[h{si)-f2{si)] (>C) 
Ri + R2<I{V;Y2) + I{X;Yi\V) 

= f2{s2) + C-h{s2) (>C). 

To show that we can have (1 — e)C + + /2(s2) > we just need to choose small e to ensure 

/2(s2) > e[C — /i(si)]. Since this is clearly possibe, we have OB D TD. 

(b) =^ (c): From Equation ([3]), we have the following expression of the boundary of the outer 
bound, 

Ri<I{U;Yi) = h{si) 

R2<I{V;Y2) = f2{s2) 
Ri + R2< I{U- Yi) + I{X- Y2\U) = fiisi) + C- f2{si) 
Ri + R2< I{V; Y2) + I{X; Yi\V) = f2{s2) + C - h{s2) 

Clearly for every si G /, S2 £ >^ if fi{si) + f2{s2) < C then from above OB = TD. However 
since OB D TD, there exists si £ I , S2 £ J such that /i(si) + 72(52) > C. 

(c) =^ (d): In general, TD C RTD C MIB. So now it suffices to show there exists an example 
where the sum rate of RTD region is strictly larger than TD region. 

We now compute the maximum sum rate of the RTD region. From Lemma [6] we know that this 
matches the maximum sum rate of the MIB region. 
Consider an auxiliary channel W ^ X such that 

P(T^ = 0) = a, P{W = 1) = l-a 
P{X = 0\W = 0) = S2, F{X = 0\W = 1) = si 



where as2 + (1 — a)si = \ 



2- 
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It is straightforward to check the following 

I{X; Yi\W = 0) = C- /i(s2), I{X; Yi\W = I) = C - 

I{X; Y2\W = 0) = C- /2(S2), I{X; Y2\W = 1) = C - ^(si), 

I{X;Yi)=IiX;Y2)=C. 

Then observe that 

I{W] Yi) + F{W = 0)I{X; Yi\W = 0) + P{W = l)I{X; Y2\W = 1) 
= I{X;Yi) +F{W = 1){I{X;Y2\W = I) - IiX;Yi\W = 1)) 
= C7 + (l-a)(/i(si)-/2(5i)) 

where the last inequality holds since si £ I. 
Similarly 

I{W; Y2) + F{W = 0)I{X; Yi\W = 0) + F{W = 1)I{X- Y2\W = 1) = C + a(/2(s2) - /i(s2)). 

Therefore the sum rate of RTD (eq. MIB) for this choice of {W, X) is given by 

C + min{(l - a)(/i(si) - /2(si)), a(/2(s2) - /i(s2))}. (5) 

Therefore if (c) is satisfied, i.e. there exists si G /, S2 € J, then there exists a {W, X) so that 
equation ([s]) gives a sum rate strictly larger than C. 

Remark 8. A careful reader will notice that the above argument only requires si G I, S2 G J and 
does not even require + f2{s2) > C. But existence of any Sa £ I, £ J will imply that (a) 

holds and hence (c) holds. 

(d) =^ (e): Since TD C MIB, to compute the maximum sum rate of MIB it suffices to maximize 
over si G /, S2 G J, < a < 1 the term 

C + min{(l - a)(/i(si) - /2(si)), a(/2(s2) - /i(s2))}. 

Consider any triple si G /, S2 G J, < a < 1. Pick any e > small enough (will show later how 
small we require it). 

Define {U,X) = {Q,Ui,X) where P(Q = 0) = 1 - a + £,F{Q = 1) = a - e; and Ui ^ X ^ 
BSC{si) conditioned on Q = 0, and C/i 1— )■ X ~ BSC{0) conditioned on Q = 1- Further take 
F{Ui = 0) = F{Ui = 1) = ^. Observe that this induces F{X = 0) = F{X = 1) = 5. 

Similarly define {V,X) = {Q',Vi,X) where F{Q' = 0) = a + e,F{Q' = 1) = 1 - a - e; and 
Vi^ X BSC{s2) conditioned on Q' = 0, and Vi ^ X BSC{0) conditioned on Q' = 1. Further 
take F{Vi = 0) = P(Vi = 1) = 5. Observe that this also induces F(X = 0) = F{X = 1) = 5. 

Since the distribution of X is consistent there exists a triple {U,V,X) with the same pairwise 
marginals {U, X) and {V, X) as described earlier. With this choice, OB reduces to 

Ri < I{U; Yi) = {l-a + e)h{si) + (a - e)C 

R2 < I{V; Y2) = {a + e)/2(s2) + {I - a - e)C 
Ri+R2< I{U; Yi) + I{X; Y2\U) = C + {1 - a + e){h{si) - ^(si)) 
Ri+R2< I{V; Y2) + I{X; Yi\V) = C+{a + e)(/2(s2) - fi{s2)). 

Clearly the maximum sum rate of the above region is minimum of the terms 
{C+(l-a+£)(/i(si)-/2(5i)),C+(a+e)(/2(s2)-/i(s2)),(l-2e)C+(l-a+e)/i(si) + (a+e)/2(s2)}. 
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We pick e > to satisfy 

(1 - 2e)C + (1 - a + e)/i(si) + (a + e)f2{s2) > C + {1 - a)(/i(si) - ^(si)) 
^ (1 - a)/2(si) + a/2(s2) > e{2C - fi{si) - ^(ss)), 

and 

afi{s2) + (1 - a)/2(si) > e{2C - - ^(ss)), 

then the maximum sum rate of the OB expression will be strictly bigger than that of MIB region. 
Since this is possible for every si S /, S2 S J, < a < 1, the maximum sum rate of OB is strictly 
larger than that of MIB. Therefore OB D MIB or (e) holds. 

(e) =^ (a): Since MIB C OB clearly implies the channels are not more capable comparable. 
This is because when the channels are more capable comparable we know from [2j that superposition 
coding is optimal and that MIB = CR = OB. □ 
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